Adding data from tables | Manticore Search Manual

In many situations, the total dataset is too large to be frequently rebuilt from scratch, while the number of new records remains relatively small. For example, a forum may have 1,000,000 archived posts but only receive 1,000 new posts per day.

In such cases, implementing "live" (nearly real-time) table updates can be achieved using a "main+delta" scheme.

The concept involves setting up two sources and two tables, with one "main" table for data that rarely changes (if ever), and one "delta" table for new documents. In the example, the 1,000,000 archived posts would be stored in the main table, while the 1,000 new daily posts would be placed in the delta table. The delta table can then be rebuilt frequently, making the documents available for searching within seconds or minutes. Determining which documents belong to which table and rebuilding the main table can be fully automated. One approach is to create a counter table that tracks the ID used to split the documents and update it whenever the main table is rebuilt.

Using a timestamp column as the split variable is more effective than using the ID since timestamps can track not only new documents but also modified ones.

For datasets that may contain modified or deleted documents, the delta table should provide a list of affected documents, ensuring they are suppressed and excluded from search queries. This is accomplished using a feature called Kill Lists. The document IDs to be killed can be specified in an auxiliary query defined by sql_query_killlist. The delta table must indicate the target tables for which the kill lists will be applied using the killlist_target directive. The impact of kill lists is permanent on the target table, meaning that even if a search is performed without the delta table, the suppressed documents will not appear in the search results.

Notice how we're overriding sql_query_pre in the delta source. We must explicitly include this override. If we don't, the REPLACE query would be executed during the delta source's build as well, effectively rendering it useless.

‹›

Example

Example

📋

# in MySQL
CREATE TABLE deltabreaker (
  index_name VARCHAR(50) NOT NULL,
  created_at TIMESTAMP NOT NULL  DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (index_name)
);
# in manticore.conf
source main {
  ...
  sql_query_pre = REPLACE INTO deltabreaker SET index_name = 'main', created_at = NOW()
  sql_query =  SELECT id, title, UNIX_TIMESTAMP(updated_at) AS updated FROM documents WHERE deleted=0 AND  updated_at  >=FROM_UNIXTIME($start) AND updated_at  <=FROM_UNIXTIME($end)
  sql_query_range  = SELECT ( SELECT UNIX_TIMESTAMP(MIN(updated_at)) FROM documents) min, ( SELECT UNIX_TIMESTAMP(created_at)-1 FROM deltabreaker WHERE index_name='main') max
  sql_query_post_index = REPLACE INTO deltabreaker set index_name = 'delta', created_at = (SELECT created_at FROM deltabreaker t WHERE index_name='main')
  ...
  sql_attr_timestamp = updated
}
source delta : main {
  sql_query_pre =
  sql_query_range = SELECT ( SELECT UNIX_TIMESTAMP(created_at) FROM deltabreaker WHERE index_name='delta') min, UNIX_TIMESTAMP() max
  sql_query_killlist = SELECT id FROM documents WHERE updated_at >=  (SELECT created_at FROM deltabreaker WHERE index_name='delta')
}
table main {
  path = /var/lib/manticore/main
  source = main
}
table delta {
  path = /var/lib/manticore/delta
  source = delta
  killlist_target = main:kl
}

Merging tables

Last modified: August 28, 2025

≫ Adding data from tables

Merging two existing plain tables can be more efficient than indexing the data from scratch and might be desired in some cases (such as merging 'main' and 'delta' tables instead of simply rebuilding 'main' in the 'main+delta' partitioning scheme). Thus,indexer provides an option to do that. Merging tables is typically faster than rebuilding, but still not instant for huge tables. Essentially, it needs to read the contents of both tables once and write the result once. Merging a 100 GB and 1 GB table, for example, will result in 202 GB of I/O (but that's still likely less than indexing from scratch requires).

The basic command syntax is as follows:

sudo -u manticore indexer --merge DSTINDEX SRCINDEX [--rotate] [--drop-src]

Unless --drop-src is specified, only the DSTINDEX table will be affected: the contents of SRCINDEX will be merged into it.

The --rotate switch is required if DSTINDEX is already being served by searchd.

The typical usage pattern is to merge a smaller update from SRCINDEX into DSTINDEX. Thus, when merging attributes, the values from SRCINDEX will take precedence if duplicate document IDs are encountered. However, note that the "old" keywords will not be automatically removed in such cases. For example, if there's a keyword "old" associated with document 123 in DSTINDEX, and a keyword "new" associated with it in SRCINDEX, document 123 will be found by both keywords after the merge. You can supply an explicit condition to remove documents from DSTINDEX to mitigate this; the relevant switch is --merge-dst-range:

sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0

This switch allows you to apply filters to the destination table along with merging. There can be several filters; all of their conditions must be met in order to include the document in the resulting merged table. In the example above, the filter passes only those records where 'deleted' is 0, eliminating all records that were flagged as deleted.

--drop-src enables dropping SRCINDEX after the merge and before rotating the tables, which is important if you specify DSTINDEX in killlist_target of DSTINDEX. Otherwise, when rotating the tables, the documents that have been merged into DSTINDEX may be suppressed by SRCINDEX.

Main+delta schema Killlists in plain tables

Last modified: August 28, 2025

When using plain tables, there's a challenge arising from the need to have the data in the table as fresh as possible.

In this case, one or more secondary (also known as delta) tables are used to capture the modified data between the time the main table was created and the current time. The modified data can include new, updated, or deleted documents. The search becomes a search over the main table and the delta table. This works seamlessly when you just add new documents to the delta table, but when it comes to updated or deleted documents, there remains the following issue.

If a document is present in both the main and delta tables, it can cause issues during searching, as the engine will see two versions of a document and won't know how to pick the right one. So, the delta needs to somehow inform the search that there are deleted documents in the main table that should be disregarded. This is where kill lists come in.

A table can maintain a list of document IDs that can be used to suppress records in other tables. This feature is available for plain tables using database sources or plain tables using XML sources. In the case of database sources, the source needs to provide an additional query defined by sql_query_killlist. It will store in the table a list of documents that can be used by the server to remove documents from other plain tables.

This query is expected to return a number of 1-column rows, each containing just the document ID.

In many cases, the query is a union between a query that retrieves a list of updated documents and a list of deleted documents, e.g.:

sql_query_killlist = \
    SELECT id FROM documents WHERE updated_ts>=@last_reindex UNION \
    SELECT id FROM documents_deleted WHERE deleted_ts>=@last_reindex

A plain table can contain a directive called killlist_target that will tell the server it can provide a list of document IDs that should be removed from certain existing tables. The table can use either its document IDs as the source for this list or provide a separate list.

Sets the table(s) that the kill-list will be applied to. Optional, default value is empty.

When you use plain tables you often need to maintain not just a single table, but a set of them to be able to add/update/delete new documents sooner (read about delta table updates). n order to suppress matches in the previous (main) table that were updated or deleted in the next (delta) table, you need to:

Create a kill-list in the delta table using sql_query_killlist
Specify main table as killlist_target in delta table settings:

‹›

CONFIG

CONFIG

📋

table products {
  killlist_target = main:kl
  path = products
  source = src_base
}

When killlist_target is specified, the kill-list is applied to all the tables listed in it on searchd startup. If any of the tables from killlist_target are rotated, the kill-list is reapplied to these tables. When the kill-list is applied, tables that were affected save these changes to disk.

killlist_target has 3 modes of operation:

killlist_target = main:kl. Document IDs from the kill-list of the delta table are suppressed in the main table (see sql_query_killlist).
killlist_target = main:id. All document IDs from the delta table are suppressed in the main table. The kill-list is ignored.
killlist_target = main. Both document IDs from the delta table and its kill-list are suppressed in the main table.

Multiple targets can be specified, separated by commas like:

killlist_target = table_one:kl,table_two:kl

You can change the killlist_target settings for a table without rebuilding it by using ALTER.

However, since the 'old' main table has already written the changes to disk, the documents that were deleted in it will remain deleted even if it is no longer in the killlist_target of the delta table.

‹›

SQL
HTTP

📋

ALTER TABLE delta KILLLIST_TARGET='new_main_table:kl'

Merging tables Attaching one table to another

Last modified: August 28, 2025

Main+delta schema

≫ Adding data from tables

Merging tables

Killlist in plain tables

Table kill-list

Removing documents in a plain table

killlist_target